Reporting
The Reporting module is responsible for managing manual quality review processes within the CogSol platform.
It allows evaluators and administrators to create structured reports about assistant responses, enabling the continuous improvement of conversational performance and the generation of curated evaluation datasets.
What is Reporting?
Reporting is the process of manually reviewing and classifying assistant interactions to identify correct answers, incorrect behaviors, and cases where the assistant could not provide information.
These reports serve two main purposes:
- Quality Assurance: providing direct feedback on assistant performance.
- Dataset Curation: generating labeled examples that can be reused in training and evaluation.
By integrating Reporting with Evaluator and Analyzer modules, CogSol ensures a closed feedback loop between observation, measurement, and improvement.
Purpose of Reporting
The Reporting module aims to:
- Capture human evaluations of assistant responses in structured form.
- Identify errors, missing information, or correct behaviors that automatic evaluation might not fully capture.
- Provide traceable and categorized reports that feed into broader performance analytics.
- Build datasets for future evaluation or retraining.
- Support continuous learning cycles between production and evaluation environments.
Report Creation
Reports are generated from a chat visualization that displays past assistant–user conversations and allows reviewers to evaluate responses.
During a chat, reviewers can open a report form and classify the response as:
- Correct Report – when the assistant responded accurately or appropriately.
- Wrong Report – when the assistant’s answer was incorrect or incomplete.
- No Info Report – when the assistant was unable to provide an answer.
Each report includes configurable fields such as topic, evaluation type, severity level (if it was a Wrong report), and optional comments for contextual feedback.
Report Types
Correct Report
Used when the assistant’s response is valid and aligned with expectations.
Options include:
- Marking the response as correct when the agent respond no info.
- Saving the case as part of the evaluation dataset for regression or training purposes.
Wrong Report
Used when the assistant’s answer is inaccurate, misleading, or incomplete.
Requires classification by:
- Response type (e.g., incorrect content, relevant information existed in the platform but was not used, or the question required an answer but it was unclear whether the platform contained sufficient content to respond).
- Severity level (minor, major, or critical).
- Topic and comments to specify the nature of the issue.
No Info Report
Used when the assistant indicates a lack of information or coverage on a topic.
Allows reviewers to specify:
- Whether the assistant handled the case gracefully (acknowledging the limitation), relevant information existed in the platform but was not used, informateion is needed to add to the platform, or the question required an answer but it was unclear whether the platform contained sufficient content to respond.
- Topic and comments to specify the nature of the issue.
Metrics and Dashboard
The Reporting dashboard consolidates all reports into quantitative metrics, providing a clear overview of assistant performance and review activity.
Key Metrics
- Total Reviewed Cases: total number of manually reviewed interactions.
- Accuracy: percentage of correct responses among all reviewed cases. This includes:
- cases reported as correct,
- cases where the assistant correctly stated that no information was available, and
- cases where the response was correct but the platform is missing supporting content that should be added.
- Missing Info Rate: proportion of cases where the assistant lacked sufficient information to respond. This includes cases where the “no information available” response was correct, but additional content should be incorporated into the platform.
- Error Rate: proportion of incorrect or unsatisfactory responses. This includes cases where:
- the assistant responded incorrectly using available information, or
- the assistant stated that there was no information when the relevant content was actually present in the platform.
- Total Reports: cumulative number of responses flagged as erroneous.
- Successful Cases: responses marked as correct.
- Open Reports: reports pending review or corrective action.
- Resolved Reports: reports reviewed and closed after corrective action was completed.
These metrics can be filtered by assistant, enabling cross-comparison between different deployed models or environments.
Workflow
- Interaction Review: the reviewer selects an assistant and engages in or replays a conversation.
- Report Creation: reports are created by clicking specific conversation areas depending on the issue type.
- Classification: the reviewer fills the report form (response type, topic, severity, comments).
- Submission: once submitted, the report becomes part of the centralized reporting database.
- Monitoring: aggregate metrics are updated and displayed in the dashboard.
This process ensures every report contributes to the platform’s knowledge base and evaluation datasets.
Benefits of Reporting
- Human validation: complements automatic evaluation with qualitative human judgment.
- Dataset enrichment: converts real-world interactions into labeled data for retraining and evaluation.
- Transparency: maintains traceability for each reviewed case.
- Operational insight: enables monitoring of assistant reliability across domains or time periods.
- Continuous improvement: closes the loop between evaluation, reporting, and model refinement.